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Abstract 


The teach-learn-assess cycle in education is broken in a typical massive open online 
course (MOOC). Without formative assessment and feedback, MOOCs amount to 
information dump or broadcasting shows, not educational experiences. A number of 
remedies have been attempted to bring formative assessment back into MOOCs, each 
with its own limits and problems. The most widely applicable approach for all MOOCs 
to date is to use peer assessment to provide the necessary feedback. However, 
unmoderated peer assessment results suffer from a lack of credibility. Several methods 
are available today to improve on the accuracy of peer assessment results. Some 
combination of these methods may be necessary to make peer assessment results 
sufficiently accurate to be useful for formative assessment. Such results can also help to 
facilitate peer learning, online discussion forums, and may possibly augment summative 
evaluation for credentialing. 

Keywords: Massive open online courses (MOOCs); peer assessment; formative 
evaluation; calibrated peer review; credibility index 
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In the past several years, massive spen online courses or MOOCs have erupted 
throughout the higher education landscape worldwide. These are typically audio, video, 
and textual instructional modules delivered via the internet and are free of charge. 
Enrollments in these courses have ranged from thousands to hundreds of thousands, 
typically from all around the world with about 1/3 of the enrollees coming from the 
United States and India (Waldrop, 2013). Numerous universities have developed and 
offered MOOCs on a trial basis. Hybrid degree programs that include a combination of 
traditional and MOOC courses, such as the master’s degree in computer science 
program at the Georgia Institute of Technology in the United States, have emerged. 
Some companies that offer MOOC online platforms are attempting to license MOOC 
contents to be coupled with traditional in-class discussions and exercises provided by 
supportive instructional staff from traditional brick-and-mortar universities and offer 
the combination as blended courses. Many MOOCs are offering participants various 
recognitions for participation and completion, ranging from certificates of completion to 
online badges, to college credits. State legislators in the United States, such as those in 
California, are demanding universities accept MOOCs for credit. These rapid 
developments have given MOOCs the appearance of potentially replacing at least some 
of the traditional university resident instruction courses as well as some online courses. 

Many observers have questioned whether these MOOCs can actually replace traditional 
brick-and-mortar instruction or even established online courses (e.g., Kauza, 2014). 
Others suggest that MOOCs ‘cheapen’ higher education and threaten the survival of high 
quality programs. Proponents, on the other hand, have complained that we might be 
asking MOOCs to meet a higher standard of quality than traditional instruction. 
Additionally, they cite the rising cost of tuition for higher education, coupled with the 
decreasing average annual income of families, as an unsustainable model that MOOCs 
might help address (e.g., Barber, Donnelly, & Rizvi, 2013). 

Regardless of one’s position, the general vision regarding MOOCs is that they constitute 
individual stand-alone, completely functional units that not only serve as another open 
educational resource, but can in fact lead to massive open learning. 

From the Few to the Masses 

It might be useful to pinpoint key differences between MOOCs and traditional university 
instructional modes, including those of large lecture classes as well as non-MOOC 
online courses. Perhaps the most obvious and also most critical difference between 
MOOCs and traditional classes is scale. While the largest of the traditional classes - 
large university classes in lecture halls - may have over 1,000 students, MOOCs 
typically have tens of thousands to over 100,000 students. 

There may well be many social, economic, technological reasons for the emergence of 
MOOCs at this juncture. However, from a broad historical perspective, the development 
of MOOCs is a logical continuation of a trend in education, made possible currently by 
developments in communication technology and the internet. Historically, education as 
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a social institution has moved in a single direction: from the education of the privileged 
few to the education of the masses. This is particularly the case for higher learning and 
technical training. We started from the few teachers with few disciples (e.g., Socrates, 
Confucius, Shakyamuni, instruction in the monastaries of Taxila in India) approach; 
evolved next to a system of many teachers each having very few students (e.g., masters 
and apprentices; tutors/imams/zen masters and students); to formal educational 
institutions for aristocrats and the privileged (e.g., European universities, U.S. Ivy 
League colleges, Guozijian in China, madrasas p— j^] in Muslim countries); to 

finally mass compulsory basic education and mass higher education (e.g., land grant 
colleges, GI Bill in the U.S.) with many teachers each teaching many students. The next 
logical step in this evolution may very well be universal open education for either self- 
actualization or credentialing. MOOCs promise to be a part of this next step in 
education. 

From Teaching to Broadcasting 

As we move from education of the privileged few to education of the masses, the 
learner-to-teacher ratio is increased at every stage. Access to the teacher by students is 
reduced and the learning experience is correspondingly diluted. A most important loss 
is the reduction in the opportunity to interact with teachers. While many aspects of the 
teaching/learning experience can be approximated through technology, the opportunity 
to interact with the teacher is an inverse function of the learner-teacher ratio and cannot 
be approximated without cloning the teacher. Current large university lecture classes 
attempt to approximate this cloning via the use of teaching assistants. MOOCs are 
taking this learner-to-teacher ratio problem to yet another level. 

What is so important about student-teacher interaction? This has to do with what 
constitutes a sound educational process. The process involves a 3-step cycle: teach- 
learn-assess/feedback (cf. Frederiksen & Collins, 1989). The formative assessment or 
feedback step is critical to guide subsequent instruction and to ensure learning. In the 
earliest Socratic/Confucius mode, feedback from and to each individual student occurs 
naturally and is constant and continuous. As we moved over the ages through the 
apprentice to mass education modes, feedback to each individual student has become 
more and more sparse. Attempts have been made by some to put individual feedback 
back into the cycle by designing what is known as dynamic assessment (cf., Feuerstein, 
Feuerstein, Falik, & Rand, 2000; Haywood & Lidz, 2007). However, what that does is 
attempt to force into the system of mass education the original few-teacher, few-student 
mode and has not been practical. Hence, this approach is found primarily in special 
education, where individualized educational plans are often used or even legally 
mandated. It is also found in second language learning to some degree (e.g., Lantolf & 
Poehner, 2011). Many large lecture classes in universities also attempt to put back some 
small degree of individual feedback by breaking the class up into smaller ‘recitation’ 
sessions with teaching assistants or tutors. 
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Feedback and assessment in open and distance learning are inherently difficult to begin 
with (Chaudhary & Dey, 2013; Letseka & Pitsoe, 2013; Suen & Parkes, 1996). The 
problem of the reduction of individual feedback from, and interactions with, instructors 
becomes extreme in MOOCs. Due to the scale of MOOCs, feedback to individual 
students from the instructor has become virtually impossible. Yet, teaching without 
assessing whether the student has learned and without giving students feedback as to 
whether they have indeed learned the material correctly amounts to a one-way 
information dump or broadcasting, not education. A MOOC, in that form, would be 
essentially not different from the thousands of free how-to Youtube videos on the 
internet or the various free instructional videos provided by the Khan Academy 
(http://khanacademy.org) and cannot be considered a complete teaching-learning 
experience. 

Attempts at Remedies 

The teach-learn-assess cycle is essentially broken in a MOOC. Various attempts have 
been or are being made to re-introduce some degree of formative assessment feedback 
into the process to prevent it from becoming a one-way information dump or 
broadcasting show. 

Many methods are suitable for feedback in an open distance learning environment in 
general. These include (a) automated tutors; (b) peer feedback; (c) auto-scoring of 
assignments; (d) reflective networks; (e) written comments; (f) oral comments; (g) 
meta-verbal; (h) emoticons; (i) self-checks; and (j) ePortfolio (Costello & Crane, 2013). 
Additionally, many developments in ICT have enabled feedback and assessment 
activities analogous to those of feedback activities in a traditional classroom. However, 
only a limited subset of these methods and technology are applicable to the scale of 
MOOCs. 

In terms of assessment, some MOOCs offer online multiple-choice quizzes that are 
machine-scored as progress checks and feedback to students. At the end of each 
instructional module, a number of multiple-choice questions would be posed to the 
student. These questions are intended to gauge the student’s mastery of the concepts 
and other contents covered in that module. The scores on these tests would indicate 
whether the student has adequately learned the material and the scores are given to the 
student as feedback. Students who do not do well would be encouraged to return to the 
previous module to review the materials. This approach is basically an online version of 
the old programmed-learning approach (Bloom, 1971; Skinner, 1968), popular briefly in 
the 1960s and 70s and quite limited in applicability since it is appropriate only for 
certain types of course contents where abilities to recall or to differentiate concepts or to 
interpret or extract information from text or graphics related to the subject matter are 
the only important instructional objectives. It is also challenging to most instructors to 
develop good quality multiple choice test items to measure high-level cognition such as 
analysis, synthesis, and evaluation, in Bloom’s taxonomy. It is not appropriate for 
courses in which the desired evidence of learning is to have students demonstrate an 
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ability to generate ideas or produce a product, such as answer open-ended questions, 
write an essay, submit a report, design an artifact, engineer a process, or solve an ill- 
defined complex problem. 

For open-ended writing assignments, automated essay scoring algorithms can be used 
(Balfour, 2013). These essay scoring programs have become more and more 
sophisticated and can detect many types of error in writing and can provide automated 
feedback to inform students of errors. An example of such an algorithm is the e-rater 
system used by the Educational Testing Services in the United States to score essays in 
the SAT test (see http://www.ets.org/research/topics/as nip) . However, these 
programs are appropriate only when English writing ability is the construct of interest 
and are therefore appropriate only for MOOCs that teach English writing skills. 
Additionally, even when the objective of the course is writing ability, these programs can 
only detect errors in the more mechanical aspects of writing such as verb-noun 
agreement, run-on sentences and other grammatical or syntactical errors, and even 
organization to some extent, but are generally not capable of evaluating such abstract 
qualities as theme, humor, irony, coherence, and so on (Williamson, Xi, & Breyer, 2012; 
Zhang, 2013). 

To provide feedback to students in general, in some cases instructors would provide 
answers to a limited number of most popular questions posted in the MOOC online 
discussion forum. The popularity of each question is often determined by a system of 
like/dislike votes similar to that used in Facebook. This, of course, is quite far from 
providing individual formative feedback and leaves the overwhelming majority of 
student questions unanswered. For the majority of the students, formative assessment 
and feedback would still be missing. 

One solution that has emerged to address both the problem of the lack of formative 
feedback and that of a lack of revenue stream for the investment of resources in the 
development of MOOCs is to place MOOCs within a blended learning or flipped learning 
structure. This approach would have students view contents of a MOOC on their own 
and at their own pace. After learning the materials via the MOOC, they would attend 
local brick-and-mortar classes in which they would do assignments and participate in 
discussions with local instructors. While the MOOC portion may be free, the face-to-face 
sessions would be fee-based. Georgia Institute of Technology in the United States, for 
instance, has initiated a Master of Computer Science degree program for $6,000 to 
combine MOOCs with a large number of instructional tutors in a blended manner. 
Coursera is also attempting to license contents of existing MOOCs to be coupled with 
local instructional staff for credit-bearing courses at traditional universities. 

This blended- or flipped-learning approach appears to be a workable alternative that 
would solve the central problems of assessment, feedback, and revenue. The flipped or 
blended learning mode is fundamentally quite similar to many advanced seminars in 
universities in which students are assigned take-home readings from textbooks or 
reference materials and are then to provide reports and participate in instructor-led 


Vol 15 | No 3 


July/14 





Peer Assessment of Massive Open Online Courses (MOOCs) 


Suen 


discussions in class. This approach to the use of MOOCs would fundamentally change 
its nature and function from its original promise of offering massive free universal 
education to those of a free multimedia, interactive analog of a traditional textbook (see 
e.g., Krause, 2014). 

Finally, the single approach that is widely applicable to most, if not all, MOOCs is to use 
peer assessment and peer discussion forums to provide formative feedback to students. 
In this approach, fellow students within a MOOC are asked to evaluate student 
assignments and to provide feedback to other students. Unlike the use of multiple- 
choice quizzes or automatic essay scoring, it is applicable to all contents and 
assignments. It is also the most economical approach without the need to hire a large 
pool of support instructional tutors as in the case of blended learning models. It allows a 
MOOC to be a complete stand-alone educational tool without reducing the role of the 
MOOC to that of a multimedia interactive textbook. 

Peer Assessment and Issues 

There is a large body of literature about various aspects and effective practices of peer 
assessment in traditional classroom instruction (see Falchikov & Goldfinch, 2000; 
Gielen, Dochy, Onghen, Struyven, & Smeets, 2011; Li, Xiong, Zang, Kornbaher, Lyu, 
Chung, & Suen, 2014; Norton, 1992; Topping, 2005). In traditional classroom 
instruction, peer assessment has been commonly used to facilitate class discussions, 
often in small groups or dyads, often under the supervision and guidance of the teacher, 
and augmented by instructor assessment (Gielen et al., 2011). Peer assessment in 
MOOCs, however, exists in a very different environment. First, and most obviously, is 
the issue of scale. For a single assignment within a single MOOC, there are tens of, to 
over a hundred, thousand potential peer raters evaluating up to over a hundred 
thousand submissions. The logistics of linking raters and assignments are considerably 
more complex (Balfour, 2013). The second difference is that, because of the scale, there 
is little to no instructor mediation, supervision, or guidance. (Note that for flipped 
learning, the supervision exists in the traditional portion of the course, not within the 
MOOC.) A third difference is that MOOC participants are international. There is a large 
variation in native language, culture, value, and worldview among peer raters. Without a 
teacher overseeing the process, there is also little sense of obligation or incentive for 
students to take the peer assessment process seriously. It is, for example, known that 
MOOCs which employ peer assessments tend to have lower course completion rates 
(Jordan, 2013). It is not clear whether this low completion rate is an effect of the use of 
peer assessment or the result of asking students to submit open-ended assignment tasks 
instead of just clicking multiple choice answers. Such tasks also concomitantly 
necessitate the use of peer assessment. 

Because of these differences, peer assessment in MOOCs will need to be 1) simple and 
easy to understand for students; 2) efficient in execution without occupying much time; 
and 3) limited in that each student rater is asked to rate no more than a handful of other 
students’ assignments. In other words, peer assessment methods in MOOCs need to be 
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as scalable as MOOCs. These limitations would in turn lead to each assignment being 
rated by no more than a handful of peers realistically. The resulting assessment score 
data would then be one of a nested design with missing data in most cells. With a large 
enrollment for the course but only a handful of different peer raters per assignment, the 
distribution of rater abilities and knowledge for each assignment and between 
assignments will necessarily be uneven and imbalanced. Some assignments would be 
rated by excellent and knowledgeable raters while some would be rated by poor and 
uninspired raters. 

In its most basic form, the process of peer assessment within a MOOC would be as 
follows: A scoring rubric is developed for an assignment, the latter usually in the form of 
a project, an artifact, or a written report, within an instructional unit in a MOOC. 
Students are instructed to complete the assigned project and submit it online. Each 
project is distributed to several randomly selected fellow students by asking the fellow 
students to view the project online. Each fellow student rater is then to rate the quality 
of the project based on the predetermined scoring rubric. Raters are also asked to 
provide some written comments. The mean or median rating score is taken as the score 
for the project. The score as well as the written comments are then made available to the 
original student who submitted the project. Through this process, each project is rated 
by no more than a handful of peer raters and each peer rater would rate no more than a 
handful of projects. 

Accuracy of Peer Assessment Results and Remedies 

Perhaps the most glaring problem with peer assessment is how trustworthy the results 
are. After all, within peer assessment, the performance of a novice is being judged by 
other novices. Is it possible that peer raters misjudge the quality of the submission even 
with the guidance of the predetermined scoring rubric? Is it possible that peer raters 
judge a submission highly because the raters and the submitter share the same set of 
common but erroneous misconceptions? Or equally troubling, is it possible that a peer 
rater judges a submission as poor due to the rater’s own misconceptions about the 
subject matter? Without the mediation of an instructor, can erroneous peer assessment 
results actually harm learning? In spite of a few studies suggesting peer assessment 
results correlate well with instructor ratings in conventional classrooms as well as 
online courses for highly structured tasks with narrowly defined correct responses (e.g., 
Bouzidi & Jaillet, 2009), the doubt regarding the accuracy of peer assessment in general 
remains. Students, in particular, do not trust the results of peer assessment (e.g., 
Furman & Robinson, 2003). A similar problem exists for unmoderated peer online 
discussion forums. 

To provide a glimpse of students’ mistrust of peer assessment results or peer online 
discussions, below is a sample of comments from peer evaluators found in several 
MOOCs offered by the Pennsylvania State University in the U.S. in 2013 (Suen & Pursel, 
2014): 
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I hated the peer assessments as in some cases, their 
anonymity gave the peers an excuse to say mean-spirited 
things. 

Peer-to-peer evaluation can not replace the teaching by 
an expert. The evaluations are not deep and rich enough. 

Asking tens of thousands people to discuss online about 
anything is stupid. Letting three random Internet trolls 
(also known as peers) to decide whether one passes with 
distinction or not is moron. 

I really disliked the peer assessment. I worked very hard 
on my map and out of the reviews only one offered 
constructive criticism. The others I question if they even 
looked at my map rather than just the attached image of 
it. The comments that were made didn't even make 
sense. 

A few approaches at various stages of development have been put forth to address the 
concern for accuracy of peer assessment results in MOOCs. These include connectivist 
MOOCs (cMOOCs), the Calibrated Peer Reviews (CPR™) system, a Bayesian post hoc 
statistical correction method, and a credibility index approach. 

Connectivist MOOCs. 

The approach used by connectivist MOOCs is to remove the concern for accuracy 
altogether from peer assessment and peer discussions by deliberately designing the 
course to welcome and encourage diverse perspectives from participants. Proponents of 
this approach view assignments, projects, and online discussions as opportunities for 
crowd-sourcing, leading to superior results that otherwise cannot be achieved 
individually by the students (or the instructor). The underlying orientation of this 
approach is what is known as the connectivist pedagogy, proposed by Siemens (2005) 
and others. The idea is that knowledge is gained experientially via connections a student 
makes among nodes. As such, peer perspectives provide the necessary nodes for the 
connections. The MOOCs with this basic orientation are referred to as cMOOCs. The 
idea of peer assessment is moot within a cMOOC paradigm, as peer connections are the 
very process of learning. This approach might be quite limited in terms of potentially 
applicable course contents. Further, the connectivist pedagogy remains controversial 
today (see, for example, Kirschner & van Merrienboer, 2013). 

Calibrated Peer Reviews (CPR™). 

Another approach is to evaluate the accuracy of the ratings provided by each student 
rater and assign weights to their ratings according to their relative degree of accuracy. 
The final rating score for the submission would be a weighted average of the rating 
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scores from peer raters. This approach is exemplified by the Calibrated Peer Review 
(CPR™) developed at the University of California - Los Angeles. The CPR™ approach is 
a general purpose peer assessment approach that is readily applicable to MOOCs. It is 
inherently scalable and can be used in MOOC as well as non-MOOC settings. The 
general peer assessment process is similar to that of the basic peer assessment 
approach, with the addition of a calibration process. During calibration, each peer rater 
is to rate up to three standard essays or projects of known quality that had already been 
rated by the instructor. All peer raters would rate the same essays/projects. The 
proximity between a peer rater’s rating score and that of the instructor of the same 
essay/project is used as an indicator of the accuracy of the peer rater. This indicator is 
then used as the weight for that rater’s ratings of actual peer performances. The more 
accurate is the rater, the more weight is given to that rater’s judgment of peer 
performance. The performance score for each student submission is the weighted 
average of peer judgment scores. Many studies have demonstrated that CPR™ is an 
effective instructional tool that can help to improve students’ scientific writing skills, 
confidence in self-assessment, academic performance in physiology, patient note 
writing among medical students, and so on (e.g., Furman & Robinson, 2003; Hartberg, 
Guernsel, Simpson, & Balaster, 2008; Liklcel, 2012; McCarty, Parkes, Anderson, Mines, 
Skipper, & Grebosky, 2005; Pelaez, 2002; Reynolds & Moskovitz, 2008). However, few 
studies have been conducted to demonstrate the system can produce reliable and valid 
assessment results. 

Bayesian post hoc stabilization. 

Piech, Huang, Chen, Do, Ng, and Roller (2013), Goldin and Ashley (2012), and Goldin 
(2011) developed a number of Bayesian models to improve peer assessment results by 
imposing standard prior distributions to the ratings. The process proposed by Goldin is 
fundamentally similar to an empirical Bayes estimation process by imposing a normal 
prior within-rater distribution of rating scores as well as a normal prior between-rater 
distribution of scores. This process would produce more stable posterior peer 
assessment results, but cannot actually correct systematic errors of judgment due to 
misconceptions. This approach is shown to produce peer ratings that are more accurate 
than those from the basic peer assessment approach. Goldin (2011) found that the 
Bayesian approach reduced error of predicting instructor rating by 19-30%. The Piech et 
al. method is slightly different, but follows the same basic logical orientation. Whereas 
the CPR™ approach, as well as Goldin’s approach, define accuracy as proximity to 
instructor rating, Piech et al.’s approach defines accuracy as proximity to the mean or 
median of peer rater scores in either a unidimensional or multidimensional space. 

Credibility index. 

The credibility index approach (Suen, 2013a, 2013b; Xiong, Goins, Suen, Pun, & Zang, 
2014; see http://tlt.psu.edu/2012/07/12/peer-assessment-in-moocs-the-credibilitv- 
index/) is an attempt to improve the accuracy of peer feedback by modifying and 
refining the CPR™ method. The basic premise of the credibility index approach is that 
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errors in peer assessment results arise from at least three sources: basic error of 
judgment due to insufficient knowledge (inaccuracy), random judgmental error due to 
idiosyncratic situational factors at the time of judgment (inconsistency), and inability to 
maintain a constant level of accuracy from context to context (intransferability). 
Whereas the CPR™ method considers only the inaccuracy of the peer rater, the 
credibility index (Cl) approach takes into consideration the accuracy of the rater, the 
consistency of the rater, and the transferability of the level of accuracy between contexts 
and assignments. The approach attempts to garner the needed additional information 
without adding much more to the rater’s burden beyond what is already gathered in the 
CPR™ method. Theoretically, this approach should improve the accuracy of peer 
assessment results and there is preliminary evidence that is supportive of that claim 
(Xiong et ah, 2014). Additional research is currently underway to confirm its efficacy. If 
proved to be effective, the Cl can also be used to rank peer answers and comments in 
online discussion forums based on the Cl value of each responder, and thus is 
potentially capable of moving the system away from ranking comments based on 
popularity to one based on knowledge. 

Nature of Peer Assessment Errors 

If MOOCs are to be a complete educational experience, and not just a free multimedia 
version of traditional textbooks, the key seems to be whether there is a viable and 
scalable built-in formative assessment and feedback process. Among the various options 
available, peer assessment is the most widely applicable method to date. In spite of the 
many studies showing the efficacy of peer assessment in promoting learning, skepticism 
remains as to whether peer assessment results can be trusted. 

One source of ambiguity in evaluating the accuracy of peer assessment results seems to 
be the problem of determining what constitutes the true score. Most studies that 
attempt to evaluate accuracy have used instructor rating as the absolute standard and 
the quality of peer rating is determined by how far it departs from instructor rating. 
However, Piech et al. (2013) offer a different argument: 

For our datasets, we believe that the discrepancy 
between staff grade and student consensus typically 
results from ambiguities in the rubric and elect to use 
the mean of the student consensus on a ground truth 
submission as the true grade. 

In the case of Piech et al.’s situation, the ground truth submission was rated by 
hundreds of peer raters. Given the large number of peer raters, their decision to use the 
mean of student ratings as the ‘true score’ may be a manifestation of the trust in 
crowdsourcing. 

While the majority of studies continue to consider proximity to instructor rating as the 
gold standard of accuracy, Piech et al.’s reasoning does reflect the complexity of the 
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situation. There are at least six types of discrepancies in a peer assessment situation. 
These include: A) the discrepancy between the rating given by a peer rater and rating by 
the instructor on the same piece of work; B) the random situational fluctuations of the 
ratings to that same piece of work given by that same peer rater under different 
conditions; C) the inconsistency of ratings given to other similar pieces of work with 
similar quality but may differ in context or style; D) the random discrepancies between 
different peer raters on the same piece of work using the same set of criteria or rubric; 
E) the systematic discrepancy between different raters on the same piece of work due to 
difference in rater competence or rater leniency/stringency; and F) the random 
situational fluctuations of the ratings to the same piece of work given by the same 
instructor under different conditions. The situation is analogous to a moving archer on 
horseback shooting at a moving target. 

Rater training and a carefully constructed rubric can help reduce some of the errors 
from all sources. However, in addition to rater training and good rubrics, the different 
approaches to peer assessment discussed earlier can be viewed as attempts to tackle 
different combinations of these sources of error. The CPR™ is designed to minimize 
errors A and E in general, but the existence of other sources of error can render this 
effort ineffective for a given assessment. The Bayesian approach is designed to minimize 
error D. The Cl approach is designed to minimize errors A, B, C, & E, but does require 
slightly more information from the rater than otherwise collected by other methods. No 
method has been developed to minimize error F, except for the desirable practice of 
developing clear rubrics. The cMOOC approach would not consider these to be errors at 
all, but part of the diversity of views upon which knowledge is to be gained. 

It is theoretically possible to combine these approaches into a single most effective 
composite approach in which raters are calibrated after training via the credibility index 
approach and the resulting ratings are further refined via a Bayesian or empirical Bayes 
approach. 

Finally, one remaining problem with peer assessment in MOOCs is the probability of an 
assignment being rated by all poor raters. This problem may be minimized if the peer 
rater distribution algorithm uses a stratified sampling process based on prior 
knowledge, or credibility index value, or performance as a peer rater in previous 
assignments, instead of the current random assignment process. 

It should be noted that peer assessment, whether the results are accurate or not, is 
considered valuable as an instructional tool in its own right. Indeed, Topping (2005) 
folded peer assessment as part of a larger category of peer learning. However, accurate 
peer assessment results would further enhance this learning experience, as well as serve 
the purpose of assessment. Additionally, if can be made reasonably accurate, peer 
assessment results can be used for purposes beyond formative assessment. One such 
potential use is to facilitate online discussion forums by putting more weight on 
opinions of student raters whose judgments of peer performances are close to that of the 
instructor’s. Another potential use is to use student raters’ performance-as-raters to 
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supplement final summative evaluations of each student for the purpose of 
credentialing. The feasibility of the latter purpose, even if peer assessment results are 
made accurate, is not clear at this time - at least not clear in the United States. Based on 
the 2002 US Supreme Court unanimous ruling on the Owasso Public Schools v. Falvo 
case (2002), peer assessment as formative evaluation does not violate the 1974 U.S. law 
known as the Family Education Rights to Privacy Act (FERPA). The key basis of the 
judgment seems to be restricted to the idea that peer assessment results for formative 
purposes do not constitute part of the student’s school record. At this time, whether 
peer assessment results can be used as part of a summative grade, including 
credentialing and certification, and still not violate FERPA is not clear. 
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